Multimodal Live Streaming - yuyan

Multimodal Live Streaming

Vision Language Model

映像基盤モデル

Large Language Model

視覚文書理解

A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

https://github.com/OpenBMB/MiniCPM-o